本文报道的研究通过应用计算机视觉技术将普通的垃圾桶转化为更聪明的垃圾箱。在传感器和执行器设备的支持下,垃圾桶可以自动对垃圾进行分类。特别是,垃圾箱上的摄像头拍摄垃圾的照片,然后进行中央处理单元分析,并决定将垃圾桶放入哪个垃圾箱中。我们的垃圾箱系统的准确性达到90%。此外,我们的模型已连接到Internet,以更新垃圾箱状态以进行进一步管理。开发了用于管理垃圾箱的移动应用程序。
translated by 谷歌翻译
我们为神经机翻译(NMT)提供了一个开源工具包。新工具包主要基于拱形变压器(Vaswani等,2017)以及下面详述的许多其他改进,以便创建一个独立的,易于使用,一致和全面的各个领域的机器翻译任务框架。它是为了支持双语和多语言翻译任务的工具,从构建各个语料库的模型开始推断新的预测或将模型打包给提供功能的JIT格式。
translated by 谷歌翻译
目前最先进(SOTA)命名实体识别(NER)系统的显着缺点是它们对未经概念域的概念缺乏,这造成了一个主要问题,因为在新域中获得了NER的标记数据是昂贵的并且耗时。我们提出零,通过在语义词嵌入形式的形式中纳入预先存在的知识来拓展,以在NER中执行零射击和几秒钟学习的模型。归零首先使用模型Luke获取输入句子的上下文化字表示,减少了它们的维度,并将它们直接与外部知识的嵌入式进行比较,允许训练零以识别未经识别的输出实体。我们发现零在看不见的NER域中表现出良好,平均宏F1得分为0.23,距离诸如域名比较上的竞争得分甚至实现了竞争分数。源极域对的性能显示与对kl发散相反的关系。
translated by 谷歌翻译
In this work, we propose a new approach that combines data from multiple sensors for reliable obstacle avoidance. The sensors include two depth cameras and a LiDAR arranged so that they can capture the whole 3D area in front of the robot and a 2D slide around it. To fuse the data from these sensors, we first use an external camera as a reference to combine data from two depth cameras. A projection technique is then introduced to convert the 3D point cloud data of the cameras to its 2D correspondence. An obstacle avoidance algorithm is then developed based on the dynamic window approach. A number of experiments have been conducted to evaluate our proposed approach. The results show that the robot can effectively avoid static and dynamic obstacles of different shapes and sizes in different environments.
translated by 谷歌翻译
We introduce an approach for the answer-aware question generation problem. Instead of only relying on the capability of strong pre-trained language models, we observe that the information of answers and questions can be found in some relevant sentences in the context. Based on that, we design a model which includes two modules: a selector and a generator. The selector forces the model to more focus on relevant sentences regarding an answer to provide implicit local information. The generator generates questions by implicitly combining local information from the selector and global information from the whole context encoded by the encoder. The model is trained jointly to take advantage of latent interactions between the two modules. Experimental results on two benchmark datasets show that our model is better than strong pre-trained models for the question generation task. The code is also available (shorturl.at/lV567).
translated by 谷歌翻译
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.
translated by 谷歌翻译
在现实世界应用中,联合学习(FL)遇到了两个挑战:(1)可伸缩性,尤其是应用于大型物联网网络时; (2)如何使用异质数据对环境进行健全。意识到第一个问题,我们旨在设计一个名为Full-Stack FL(F2L)的新型FL框架。更具体地说,F2L使用层次结构架构,使扩展FL网络可以访问而无需重建整个网络系统。此外,利用层次网络设计的优势,我们在全球服务器上提出了一种新的标签驱动知识蒸馏(LKD)技术来解决第二个问题。与当前的知识蒸馏技术相反,LKD能够训练学生模型,该模型由所有教师模型的良好知识组成。因此,我们提出的算法可以有效地提取区域数据分布(即区域汇总模型)的知识,以减少客户在使用非独立分布数据的FL系统下操作时客户模型之间的差异。广泛的实验结果表明:(i)我们的F2L方法可以显着提高所有全球蒸馏的总体FL效率,并且(ii)F2L随着全球蒸馏阶段的发生而迅速达到收敛性,而不是在每个通信周期中提高。
translated by 谷歌翻译
我们介绍了第一项经验研究,研究了突发性检测对意向检测和插槽填充的下游任务的影响。我们对越南人进行了这项研究,这是一种低资源语言,没有以前的研究,也没有公共数据集可用于探索。首先,我们通过手动添加上下文不满并注释它们来扩展流利的越南意图检测和插槽填充phoatis。然后,我们使用强基线进行实验进行实验,以基于预训练的语言模型,以检测和关节意图检测和插槽填充。我们发现:(i)爆发对下游意图检测和插槽填充任务的性能产生负面影响,并且(ii)在探索环境中,预先训练的多语言语言模型XLM-R有助于产生更好的意图检测和插槽比预先训练的单语言模型phobert填充表演,这与在流利性环境中通常发现的相反。
translated by 谷歌翻译
药物误解是可能导致对患者造成不可预测后果的风险之一。为了减轻这种风险,我们开发了一个自动系统,该系统可以正确识别移动图像中的药丸的处方。具体来说,我们定义了所谓的药丸匹配任务,该任务试图匹配处方药中药丸所拍摄的药丸的图像。然后,我们提出了PIMA,这是一种使用图神经网络(GNN)和对比度学习来解决目标问题的新方法。特别是,GNN用于学习处方中文本框之间的空间相关性,从而突出显示带有药丸名称的文本框。此外,采用对比度学习来促进药丸名称的文本表示与药丸图像的视觉表示之间的跨模式相似性的建模。我们进行了广泛的实验,并证明PIMA在我们构建的药丸和处方图像的现实数据集上优于基线模型。具体而言,与其他基线相比,PIMA的准确性从19.09%提高到46.95%。我们认为,我们的工作可以为建立新的临床应用并改善药物安全和患者护理提供新的机会。
translated by 谷歌翻译
机器学习的最新进展表明,通过自我监督的学习获得的预训练表示形式可以通过小型培训数据实现高精度。与视觉和自然语言处理域不同,基于IMU的应用程序的预培训是具有挑战性的,因为只有少数公开可用的数据集具有足够的规模和多样性来学习可推广的表示。为了克服这个问题,我们提出了IMG2IMU,这是一种新颖的方法,可以适应从大规模图像到不同弹药的IMU感应任务的预训练表示。我们将传感器数据转换为可解释的频谱图,以便模型利用从视觉中获得的知识。此外,我们将对比度学习应用于我们旨在学习用于解释传感器数据的表示形式。我们对五个IMU感应任务的广泛评估表明,IMG2IMU始终优于基准,这说明视力知识可以纳入一些用于IMU感应任务的学习环境中。
translated by 谷歌翻译